Engineering local optimality in Quantum Monte Carlo algorithms 
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Quantum Monte Carlo algorithms based on a world-line representation such as the worm algorithm 
and the directed loop algorithm are among the most powerful numerical techniques for the simulation 
of non-frustrated spin models and of bosonic models. Both algorithms work in the grand-canonical 
ensemble and can have a winding number larger than zero. However, they retain a lot of intrinsic 
degrees of freedom which can be used to optimize the algorithm. We let us guide by the rigorous 
statements on the globally optimal form of Markov chain Monte Carlo simulations in order to devise 
a locally optimal formulation of the worm algorithm while incorporating ideas from the directed loop 
algorithm. We provide numerical examples for the soft-core Bose-Hubbard model and various spin-S* 
models. 
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I. INTRODUCTION 



Monte Carlo methods have become a standard numer- 
ical tool in many branches of science, offering exact re- 
sults in a statistical sense In physics, some Monte 
Carlo algorithms still resemble the original description 
by Metropolis [|J in the fifties. For instance, Ising and 
other classical lattice models were for a very long time 
simulated using random spin flips. Such algorithms suffer 
from a critical slowing down in the neighborhood of the 
second order phase transition. However, some twenty 
years ago Swendsen and Wang found a solution using 
cluster updates 0, 0] , completely overcoming the critical 
slowing down for the classical Ising model. Monte Carlo 
methods have also been applied to quantum many-body 
systems, where one tries to sample either the wavef unc- 
tion or the the partition function [5[ . The quantum ana- 
log of cluster updates, namely the loop algorithm [f|, 
triggered the development of the worm algorithm Q and 
the operator loop alg orithm Q (and later the directed 
loop algorithm [ijllOl]) in the stochastic series expansion 
representation. These algorithms share the properties 
that they sample the one-body Green function, that they 
are formulated in continuous time 0] and are based on a 
world-line representation. Thus both algorithms are sim- 
ilar [ll[ . These algorithms have successfully been applied 
to spin systems and to the Bose-Hubbard model [l2| • Re- 
cently, the worm algorithm has been formulated in the 
canonical ensemble, allowing to study more systems in- 
cluding superconducting grains and the nuclear pairing 
Hamiltonian [l3l.[Ti|. 

The efficiency of a numerical simulation method is 
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primordial: efficient algorithms lead to more accurate 
results at the same computational cost and allow for 
the study of larger systems. The algorithms mentioned 
above are based on a Markov process that results in a 
random walk in a specific configuration space. Configu- 
rations are visited by the random walker proportionally 
to their respective weights. By the Markov process, 
subsequent measurements are trivially correlated. The 
transition matrix specifies the probability of going from 
one configuration to another, and has to be defined 
in advance. In practice, one requires that transition 
matrices satisfy the principle of detailed balance 
This, however, still leaves some freedom in the choice of 
the transition matrices, which can be used to optimize 
the efficiency of the algorithm. A convenient updating 
scheme is the Metropolis-Hastings algorithm 0, [l5j]: a 
limited number of configurations can be reached from 
the current one by defining a proposal distribution, and 
the transition to the new configuration is accepted or 
rejected according to detailed balance [l[ . 



In previous work, we introduced the notion of 'locally 
optimal Monte Carlo' [ill, when the degrees of freedom 
of every transition matrix are chosen in such a way as to 
mimic the globally optimal transition matrix, which, at 
least in principle, can be written down exactly. This ap- 
proach is a best guess for obtaining optimal efficiency, 
and was found successful in practice for the directed 
loop algorithm in the stochastic series expansion repre- 
sentation [16I |. Here we investigate the consequences of 
the locally optimal Monte Carlo idea for the worm algo- 
rithm, and try to combine the advantages of the directed 
loop algorithm with the advantages of the worm algo- 
rithm. This results in a new formulation of the worm 
algorithm, hereafter called the locally optimal worm al- 
gorithm (LOWA) [13 . We show results for various spin 
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models and the Bose-Hubbard model, and compare the 
efficiency of the LOWA with that of the directed loop 
algorithm in the stochastic series expansion framework. 



II. THE ALGORITHM 

We consider a two-body Hamiltonian H defined on a 
discrete lattice of system size L. The Hamiltonian can be 
written as H = Hq — V, where the terms Ho and V are 
to be specified later on. We also assume a single particle 



basis \i v ) 



) of Hq such that the action of 



any term in the Hamiltonian on a basis state yields a 
single basis state. The models that we typically have in 
mind are the Bose-Hubbard model, 

L jj L L L 

H = —t} j b\bj + — 2J ni(m—l)+Vnn E n ' ;n J~y^"»' 



and a general spin-S* model, 

L 
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In both expressions the notation refers to the sum 
over nearest-neighbor sites only. In the Bose-Hubbard 
model, bosons are created on site j by the operator &1~ 
and the number of bosons on site j is counted by the 
number operator rij . The kinetic term describes hopping 
of the bosons with tunneling amplitude t, while we 
consider two kinds of potential energy : on-site repulsion 
with strength U and nearest-neighbor repulsion with 
strength V nn - For the spin anti-ferromagnet (spin 
exchange amplitude J > 0) , we require that the lattice 
is bipartite. All matrix elements remain positive as long 
as the model does not exhibit any frustration which 
can for instance be induced by second nearest-neighbor 
hopping. As the calculations serve to demonstrate the 
ideas related to efficiency, we restrict the discussion to 
one dimension. 

A worm algorithm [7] is a quantum Monte Carlo algo- 
rithm where the decomposition of the partition function, 
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is sampled indirectly by performing local moves in the 
extended configuration space of open world-line configu- 
rations in the Green function sector, 



Tr 



T ((&i(<o)&K r ) + h.c.) expHMO) 



(4) 



The symbol T denotes time ordering, and 
we have introduced the Heisenberg operators 



(bi(t )b](T)+bl(t )bj(r) 



, ( , j ) which we call the 'worm' 

operators. Summing over all possible worms requires the 
extra sum evaluation of 
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where we have explicitly written out the Heisenberg 
worm operators, and where the '. . .' mean that they 
should be inserted at the right time, as implied by the 
time ordering of eq.(Tj|. Insertion of complete sets of ba- 
sis states allows to replace the operators H and V by 
matrix elements. We will choose to work in the number 
occupation basis for the Bose-Hubbard model and in the 
spin S z basis for the spin models. A natural choice is to 
consider the one-body tunneling operators as perturba- 
tions V, and collect the diagonal one-body and two-body 
operators in Ho. This leads to the path- integral formu- 
lation. The stochastic series expansion representation is 
found back when all operators of the Hamiltonian are put 
into V and Hq = 0. The integration over time is then im- 
mediate. In the path-integral formulation, the extended 
partition function can be written as 



*e = E E E 

n=0 i\,...,i n i,j 



dt ri 



tn 



dt n . 



dhW, (6) 



where the terms W(n, t±,...,t n ,i, j, to, t T , \i v )) can be in- 
terpreted as weights when positive. The weights depend 
on the order n and the integration times tj,j = 1, ...n 
of the perturbative expansion of the partition function Z 
in V and also on the position and the time of the worm 
operators, and all possible inserted basis sets \i u ). 

W = (i^V^e-^-^^ (i 2 \V\i3)e~ {t ^ )E ^ . . . 
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The Monte Carlo algorithm has to sample over all expan- 
sion orders n, all interaction times ti, all possible worm 
times and positions, and all basis sets In a worm 

algorithm this is achieved by moving one of the worm 
operators through the configuration space. Such updates 
shift, annihilate and create interactions and sample indi- 
rectly over all possible basis states. 

It is convenient to represent the weights in a graphical 
representation using world lines. An example is shown in 

Fig.ru 

Whenever the worm operators reach each other, the 
discontinuities in the graphical representation cancel 
and the resulting configuration belongs to the parti- 
tion function Z, apart from the worm matrix element 
{ik\bi{0)bj(0)\ik) ■ At this point we are free to assign any 
value to this matrix element. The standard choice, which 
we follow, is to take it constant for all states \ik) (see 
ea. (T2^1) ). This is graphically represented by the absence 
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III. UPDATES AND DETAILED BALANCE 



o 



FIG. 1: Graphical representation of a typical configuration 
in the Green function sector. Time goes from left to right 
in the figure, there are five sites. World lines are denoted 
by single lines (site is once occupied), double lines (site has 
occupancy two) or dashed lines (site is not occupied). Inter- 
actions (hopping of a particle) are denoted by vertical lines. 
The two circles mark a discontinuity in the world lines and 
correspond to the worm operators. One of them creates an 
extra particle, the other one annihilates it. As a consequence 
of the U(l) symmetry, total particle number is conserved at 
every interaction. 



of 'circles' (which denoted the discontinuities in Fig. [J) 
in a world-line picture. 

In the literature there have been two different imple- 
mentations of the worm idea. First, there was the worm 
algorithm by Prokof'ev et al. Q, which was formulated 
in the path-integral representation. Later the operator 
loop [|[ and directed loop algorithm H,[l(| in the stochas- 
tic series expansion representation were formulated. 

Compared to the original formulation of the worm al- 
gorithm by Prokof'ev et al. @, the efficiency of the di- 
rected loop algorithm in the stochastic series expansion 
representation was found superior for spin systems and 
even for a homogeneous Bose-Hubbard model in both 
phases (except for the extreme soft-core case) 18]. This 
is an expected result for spin systems where the diago- 
nal energies are of the same order as the spin exchange 
amplitudes, but for the Bose-Hubbard model this result 
feels unsatisfactory. For a trapped Bose-Hubbard model 
however, the worm algorithm was found superior [l8| . 

In the present paper we think of a new formulation of 
the worm idea, trying to combine to advantages of the di- 
rected loop algorithm with those of the worm algorithm. 
More specifically, we want an algorithm where the worm 
inserts and annihilates the interactions (as in the worm 
algorithm) , but we also want to use the directed worms as 
in the directed loop algorithm. The modification of the 
diagonal factors and the hopping factors contributing to 
the weig ht, eq.([7|), should be done in a locally optimal 
way [l0| . 



The easiest way to ensure that the Markov chain con- 
verges to the correct invariant probability distribution is 
by assuring detailed balance Q, 

W(X)T(X Y)q = W(Y)T(Y -» X). (8) 

The current configuration is denoted by X, the new con- 
figuration by Y. q is the acceptance factor to be used in 
the Metropolis algorithm, where the transition rule for 
going from X to Y is given by P(X -> Y) = T(X -» 
Y) min(l, q), while the transition rule for the reverse up- 
date is given by P(Y -»■ X) = T(Y ->■ X) rninfl, 1/q), 
such that W(X)P(X Y) = W(Y)P(Y -»■ X). We 
will now show that detailed balance is fulfilled for every 
possible update occurring in the algorithm. 



A. Move 

The simplest possible update is to move the mobile 
worm to another (later/earlier) time. We assume that the 
mobile worm is on site j, and corresponds to an operator 
bj. The configurations before and after the update are 
shown in Fig. [2] 




FIG. 2: Graphical illustration of the Move update. The worm, 
originally at time tk tries to jump to a later time ty . The state 
to the left of the worm is with energy Ei k . The state to 
the right of the worm is |ifc+i), with energy Ei k+1 . 



The relevant factors contributing to the weights before 
(W(X)) and after (W(Y)) the update are 

W[X) = e- tkE ^{i k \b 3 \i k+1 )e tkE ^ (9) 
W{Y) = e- t « Ei >{i k \b j \i h+ i)e i *' E, >+i. (10) 

The notation i k means the complete state over all sites 
as in eq.©, but the subscript j means the particular site 
j. The acceptance factor reads 

e-^u T(Y^X) 
q e -A*£ Ifc+1 T(X^Y)' { ' 
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with At = tk> — tk ■ The exponentials can be canceled by 
choosing the transition probability densities as 



Ik-] 



T(X 
T(Y - 



Y)dt = E lk e~ AtE *»dt 

-ME. 



X)dt = E lk+1 e~ 



k + 1 dt. 



(12) 
(13) 



The normalization factors Ei k and Ei k+1 enter into q. 
These factors will be taken into account explicitly to- 
gether with the interaction matrix elements at the mo- 
ment that interactions can be inserted/annihilated, see 
section MI Bl 

Let u be an uniform random deviate, u € [0, 1[. Then, 
p = — log u follows an exponential distribution and we 
can compute the time shift window At as At = p/Ei k = 
— \og(u)/E ik , with E ik > 0. We come back to this issue 
in Section Ull Dl The recipe for the Move update is thus 

1. Draw a random deviate u £ [0, 1[. 

2. When moving to the right (left), compute the time 
shift window At = — \og(u)/E, where E is the di- 
agonal energy to the left (right) of the worm. 

3. If no interaction is encountered, update the current 
worm time from time t to time t + At. 

This amounts to a random walk based on Poisson steps. 
The exponential factors contributing to the weight, 
eq.®, might fluctuate heavily. The strong point of the 
Poisson moves is that these exponential factors are can- 
celed exactly. 



B. Inserting, removing and relinking an interaction 

So far we assumed that the worm did not encounter any 
interaction during its propagation. What happens if a 
worm does reach an interaction? At that point a decision 
must be taken: can the worm pass the interaction or 
not? Or shall we delete the interaction? In that case we 
must also define updates to insert interactions that are in 
balance with the removal of interactions. In this section, 
we will first discuss two cases where the worm can pass 
the interaction, leaving it unchanged. Second, we discuss 
the insertion and removal of interactions which, as we will 
see, also incorporates the modification of interactions. 

When the worm b\ encounters an interaction b\.bi 
whose sites are different from the site on which the worm 
head resides (i ^ k, I), the worm can pass the interaction 
with probability one. This is a consequence of the com- 
mutator of the worm operator and the interaction being 



zero 



i i k 



blbi] 



pTj . Second, when a worm operator 



b\ encounters an interaction b]bj, it can pass the interac- 
tion with probability one. This is also the result of the 
commutator being zero, [b\,b\bj] = [l7| . 

Suppose we move to the right and want to insert an in- 
teraction. The update consists of a move from imaginary 
time t w to the time tk , inserting an interaction and then 
continue the move to the imaginary time tw , as shown in 



<l 



<l 



Ik Ik 1 T lfc+i 



site j-I 
site j 



site j-I 
site i 



tk 



FIG. 3: Graphical illustration of the insertion of an interac- 
tion by the worm. The worm, originally at the time t w before 
the time tk, jumps to the time tk and inserts an interaction 
there from site j to site j — 1. Afterwards the worm jumps 
to a later time ty, where we assume the worm always has to 
halt. Later we will relax this condition. Between times tk and 
ty a new state iy is created. Note that this update is only 
possible if the occupation on site j — 1 is larger than zero. 



Fig. [3] Let's suppose that the worm pair was created at 
t w , and that we want to evaluate estimators at time ty , 
meaning that the worm has to halt at time ty. We will 
later discuss generalizations. The relevant contributions 
to the weights of the configurations before and after the 
update are 



W(X) 
W{Y) 



-'-^(ifcNifc+Oe*- 1 ^ 1 (14) 



(i fe ^-iKfc+i}e tfc ' Ei *+ 1 



(15) 



The transition to move to the new configuration Y is 
given by the probability density 



T M (X -» Y)dt = E ih e- AtEi *e- At ' Ei *'dt, 



(16) 



with At = tk — t w and At' — ty — tk- The second ex- 
ponential does not have an E prefactor with it, since all 
time shifts larger than At' lead to the same final config- 
uration Y. Similarly, for the reverse update 



poo 

T M {Y^X) = / E, 

JAt 



tE, 



A/' 



' k + 1 dr x 



»+iE ih+1 dr 



e - AtE <k+l e - At ' E >k+l 



(17) 



Once again, all the exponential factors cancel in the ac- 
ceptance factor q. There remains a factor (i k \V\iy) / Ei k 
(with V the interaction), which is taken into account in 
the equations of detailed balance for the actual insertion 
of a new interaction. If the worm was not forced to halt 
at the time ty, the update would continue with the in- 
sertion of another interaction at ty (which would take 
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(a) 



(b) 



(c) 
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FIG. 4: When the worm moves to the right and tries to in- 
sert an interaction in configuration (a), the two possible new 
configurations are configurations (b) and (c). The third pos- 
sibility is bouncing back and changing direction in configura- 
tion (a). The transition matrix is thus a 3 x 3 matrix in one 
dimension, or a (2d + 1) x (2d + 1) matrix in d dimensions. 



the extra Ei k , (i.e., the normalization factor of the sec- 
ond exponential in the right hand side of eq. (fT6|) ) into 
account). 

When inserting a new interaction, we can either make 
the current site j interact with site j + 1 or site j — 1 
in one dimension. In higher dimensions d, interactions 
can be inserted to all 2d neighboring sites. We have to 
define the (conditional) probability distribution function 
that samples the three configurations of Figf5J When we 
are in configuration a, we have to define P(a — > b) and 
P(a — > c), both corresponding to the insertion of a new 
interaction. Similarly, when we are in configuration b 
we have to define P(b — > a) and P(b — > c), correspond- 
ing to the removal and the modification (relinking) of an 
interaction, respectively. Updating between the config- 
urations a, b and c should be done proportional to the 
following factors contributing to the weights, 



w(a) = (i k \b % +1 )E lk (18) 
w(b) = (i k \b j b]_ 1 \i , )(i , \b J _ 1 \i k+1 ) (19) 

w(c) = <i*|&i65+il* // X» / >i+i|*Jk+i)- (20) 

The energy term in ea. (|18|) will be explained in sec- 
tion llll El In Fig. [4l the worm is on site j in configuration 
(a), on site j — 1 in configuration (b) and on site j + 1 
in configuration (c). We will discuss three possibilities to 
sample configurations (a), (b) and (c). 

• The 'Metropolis-like' way. Assume that we are in 
configuration (a). We choose with equal probabil- 
ity between (b) and (c). Say (b) was chosen. The 



transition to (b) is then accepted with probability 
mm[l,w(b)/w(a)]. If the update is not accepted, 
we stay in configuration (a). This approach has 
the advantage that only the matrix elements for 
configuration (b) and (a) have to be calculated or 
known, the ones for (c) are not needed. This ap- 
proach is thus recommendable for high dimensions, 
and for long-range interactions. The (normalized) 
3x3 transition probability matrix reads 



T 



Met 



1 



Qab - 
qba 
(lea 



9q 



<lab Qac 
1 - • • ■ qbc 

q C b 1 - • 



(21) 



with qu = (1/2) mm[l,w(l)/w(k)]. The diagonal 
element corresponding to the lowest weight is thus 
zero. 

The heat-bath way. We choose between (a), (b) 
and (c) according to their relative weights, tt(J) = 
w(j)/(Ylk irrespective of the current con- 

figuration. The weights of all configurations are 
needed in order to evaluate their sum. The transi- 
tion probability matrix is 



7r(a) 7r(6) 7t(c) 
7r(a) 7r(6) 7t(c) 
7r(a) 7r(6) 7t(c) 



(22) 



The locally optimal way. Heat-bath updates 
can have relatively large diagonal elements, which 
are associated with large rejection ratios or large 
bounce probabilities. The principle of locally op- 
timal Monte Carlo updating suggests an optimal 
transition probability matrix, 



T 



Lo 







1-TTi 

El 



7T2 
1-7T1 


7T 2 1 — 27Ti 
7T3 1 — TTi 



1-7T1 
1-27T1 
1-7T1 
1 - . . 



(23) 



where the normalized weights tt(J) are now ordered 
in ascending order, 7Ti < 7T2 < 7:3. The ordering of 
the weights makes sense only if they can be tabu- 
lated before the start of the actual simulation. The 
locally optimal matrix is the stochastic matrix with 
the lowest possible second largest eigenvalue, which 
is negative. 

The non-zero diagonal elements correspond to bounces, 
meaning that the worm head changes its direction and 
undoes its changes until it reaches another point where 
an interaction can be changed, removed or inserted. 



C. Insertion and Removal of a worm pair 

The only remaining update to discuss is the inser- 
tion/removal of a worm pair, which is the connection 
between the partition function sector Z and the Green 
function sector, Z e . The update is depicted in Fig. [5) 
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FIG. 5: Graphical illustration of the insertion of a worm pair. 
An arbitrary site j and an an arbitrary time tk is chosen. We 
have shown here the case that the occupation between the 
worm ends is increased by one. 



In case of Fig. El the weights before and after the up- 
date are 

W(X) = C z (24) 
W(Y) = (i k \b j \i'){i'\b]\i k ) = (n j + l), (25) 

where Cz is a constant specifying the relative weights 
of the partition function and the Green function sectors, 
and the number rij is the occupation on site j before a 
worm pair is inserted. This is the usual convention in 
a worm algorithm. However, we are free to choose the 
weight of the worm matrix elements, and in SSE they 
are usually taken as unity. One of the worm ends will 
be moved through configuration space and is called the 
mobile worm, while the other end remains stationary. We 
choose to always annihilate a worm pair when the mobile 
worm 'bites' into the stationary worm, 



P(Y -> X) = 1. 



(26) 



A worm pair is inserted by choosing a random time and 
a random site. Thus (in case of Fig. [5]), 



P(X -» Y) 



1 1 

PL 



P(an)P(dir) 



(27) 



We have to define the probabilities P(cr) and P(an) that 
the occupation between the two worm operators is higher 
('an') or lower ('cr') than outside this infinitesimal in- 
terval. We choose with equal probability among those, 
except when the occupation is zero or equal to the max- 
imum occupation allowed. In case of zero occupancy, we 
choose with probability 50% not to insert a worm pair, 
and analogous to the case of maximum occupation num- 
ber. Thus, 



^P(an) = i 



P(cr) 



1-4 



n,-,0 



(28) 
(29) 



In case of hard-core bosons or spin— 1/2 systems, one can 
modify this relation so as to always insert a worm pair. 

When a worm pair is inserted, there are two possi- 
bilities: of we move forward in time creating a parti- 
cle, or we move backward in time creating a particle. 
Physically, moving forward in time and annihilating a 
particle is the same as the latter, while moving back- 
ward in time and annihilating a particle is the same 
as the former. Yet, the decorrelation benefits if both 
a direction (forward/backward) and an operation (cre- 
ation/annihilation) are chosen. At this point, we explic- 
itly choose a direction, and we take 



P(dir =->■) = P(dir =<-) = 1/2. 



(30) 



This equation is correct in the directed loop algorithm 
since there are on average as many moves to the left as 
there are to the right. 

Finally, we have to fix the constant Cz- Apart from 
the cases rij — and rij — Nm^x, which we have already 
discussed, the update is always accepted when we choose 
(in case of Fig. O 



C z = 4/3 L. 



(31) 



With this choice, the Green function G(x,t) is automat- 
ically correctly normalized. The procedure to measure 
will be discussed in Sec. MI El while the value of the mea- 
surements is the same as the ones discussed in the gener- 
alized directed loop algorithm [l9j and in the canonical 
worm algorithm [131 ]. 



D. Refinements 

The perturbative expansion of the partition function 
is a Poisson process: the interactions (events) are dis- 
tributed according to a Poisson distribution, with the 
intervals between them following an exponential distribu- 
tion. We sampled these intervals exactly using exponen- 
tial deviates divided by an energy. According to eq.([7]), 
these energies are the diagonal energies of the system. 
Taking into account that the direction of propagation is 
fixed in the directed loop algorithm, we only consider 
positive time shift windows At. Because the exponential 
distribution exp(— x) is only defined for positive x, the 
energies have to be positive, which is not a priori guar- 
anteed. Let Pl(r.) be the energy to the left (right) of the 
mobile worm. We can proceed in the following ways: 

1. We shift all energies with a large, positive con- 
stant, clr = Eq + Plr- This will result in small 
time shift windows. Since these energies also en- 
ter in the equations of detailed balance for insert- 
ing/removing/relinking an interaction, they might 
lead to large bounce ratios. 

2. We try to make use of the fact that only energy 
differences are physically relevant. We have the al- 
gorithmic freedom to choose any pair of 6l and £r 
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such that £l — cr = E^ — Er and £l , £r > . A good 
choice is cl,r = ^ — mm [E^, En]. This quantity 
can be zero, meaning that the jump in imaginary 
time is infinite, i.e., that the next interaction is al- 
ways reached. Compared to the previous approach, 
we make thus larger jumps in imaginary time, and 
the energies that enter the detailed balance equa- 
tions of inserting/removing/relinking an interac- 
tion are of the same order of magnitude as the 
hopping matrix elements. However, we found that 
this parameter choice resulted in some anomalously 
long (non-closing) loops and in problems with er- 
godicity when these long loops are discarded. 



3. We suggest to use the shifted energies 
£l,r = -E l ,r - min [£ L , Er] + E oS 



(32) 



where the energy offset E g overcomes the afore- 
mentioned problems with ergodicity. It makes 
sense to choose E Q g — (V) with (V) a typical ma- 
trix element of the interaction. 

After the worm has passed an interaction at time tk, 
we have to calculate the new energy parameters £l ,r 
and consider a new time shift. The latter can be ac- 
complished by either drawing a new exponential deviate 
(as in step 3), or by adjusting the time shift window 
At — » At — (tk — tk-i)e^ (when moving to the right), 
using the property that the exponential distribution has 
no 'memory'. 



E. Detailed balance and the directed loop 
algorithm 



normalization of the preceding Poisson jump. Both these 
factors are balanced through the locally optimal transi- 
tion matrix of section UlI Bl As a result one finds that q 
is exactly equal to one, hence all moves can be accepted, 
which greatly simplifies the computer code. The result- 
ing algorithm samples the configurations that appear in 
the decomposition of eq.([3]). Note that the above rea- 
soning still holds if we assume that the worm continues 
to move in the same direction after each Poisson move 
and only allow bounces at the moment when one decides 
whether or not to insert or remove interactions. This 
leads to a version of the worm algorithm that is similar 
to the directed loop algorithm for SSE @, US Gil • 

The intermediate steps of the algorithm correspond to 
configurations in the decomposition of the extended par- 
tition function Z e of eq.©. To understand this, consider 
a point in imaginary time at a distance t of the time 
where the worm was inserted. Now suppose we halt (this 
is crucial for measuring the Green function) the worm 
head at the moment it passes the point t, and suppose 
we choose with probability 1/2 whether to continue the 
move in the same direction or to move in the opposite 
direction (we will see furtheron that this latter condition 
is not necessary). Following exactly the same reasoning 
as above, we see that this algorithm leads to detailed bal- 
ance between configurations in the decompositions of Z 
and Z e (t) . Now from here we can derive that this is true 
even if we continue the worm move in the same direction 
at the moment of passing the point t. To see this it is 
instructive to consider both directions as two branches 
of a normalized transition kernel 'J that depends on two 
variables, namely the time At and the direction D, and 
fulfills detailed balance: 



To prove detailed balance one has to consider the 
global updates. Such a global update starts and ends 
with a jump in imaginary time (thus only a forced halting 
at a chosen time can end a global update). In between, 
there can be any number of insertions, annihilations or 
relinks. For example, the following sequence satisfies de- 
tailed balance : insertion of a worm pair - jump - inser- 
tion of an interaction - jump - insertion of an interaction 
- jump - removal of a worm pair. We have already shown 
that the insertion and removal of a worm pair are bal- 
anced, so we just have to look at the sequence starting 
and ending with the jump. For every such sequence there 
exists also the exact opposite sequence. Writing down the 
acceptance factor for the global move, 



q = 



W(Y)T(Y --> X) 
W{X)T{X -> Y) '■ 



(33) 



one finds that all exponential factors in the weights 
are cancelled out by the probabilities for the Poisson 
jumps. For each insertion there enters an interaction ma- 
trix element («l|V|«,r) in de numerator due to the ratio 
W(Y) /W(X), and a factor E (E L for moves to the right, 
En for moves to the left) in the denominator due to the 



W(X)$(At, ->) = W(Y)^(At, <-), 



for At = \ty — tx\ and tx < ty . We have that 



(34) 



P(X -> Y) 



#(AT, 
*(AT,- 



,X<Y 
,X>Y. 



(35) 



Thanks to time reversal symmetry, the statement that a 
worm creates a particle and propagates forward in time is 
completely equivalent to the the statement that the worm 
annihilates a particle and propagates backward in time. 
Therefore both directions will occur with equal probabil- 
ity. Suppose that at a given moment a configuration X 
is the actual one with a probability proportional to its 
weight W(X). Then the probability for a configuration 
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Y to occur at the next step is proportional to 



F. Stochastic Series Expansion representation 



W(X)P(X -v Y)dX 

= [w(X)V(At=\t Y -tx 

D J 

-■ [ W(X)$(At,^)dX 

Jt Y >t x 

■■ [ W(Y)V(At,<-)dX- 

Jt Y >t x 

■■ I W(Y)V(At,D)dX 

D ■* 

-■ W(Y) J P(Y -> X)dX 



,D)dX 



t Y <tx 



t Y <tx 



W(Y). 



As we have already mentioned, we end up in the 
Stochastic Series Expansion (SSE) representation if we 
treat all terms in the Hamiltonian as interactions. In the 
present formulation this means that the diagonal energies 
are always zero, leading to infinite time shift windows all 
W(X)^(At, <—)dXthe time. The mobile worm will jump from interaction 
to interaction, either deleting or relinking it or bouncing 
back. There is a serious problem with this algorithm, 
because it will never insert a new interaction. There- 
fore, in the SSE representation one needs two updates: 
one update is to scan over all (discrete) times in order 
to insert and remove interactions, the other one consists 
of modifying interactions with a fixed graph. The proof 
of convergence with directed loops in the second update 
^g^proceeds in the same way as for the LOWA algorithm. 



W{Y)^{At,->)dX 



This equation holds for any algorithm where the direction 
is fixed, and also forms the basis of the bounce algorithm 
ofref.fH. 

So, even when we do not change the direction of the 
worm at the moment that it passes the point r, the prob- 
ability to pass a point r is still proportional to the weight 
of the corresponing configuration in the extended parti- 
tion function Z e . This is quite subtle : Suppose we do 
a jump of the worm operator, without encountering an 
interaction. Then it is not possible to immediately come 
back to the original configuration in the next step since 
the direction is preserved. This observation lies at the 
heart of the proof of convergence of the algorithm given 
by the authors of Ref. @, M, Ql]. They prove that de- 
tailed balance is satisfied between any two diagonal con- 
figurations, from which it follows that every local step is 
balanced (in the SSE representation). Here, we see that 
detailed balance is fulfilled every time the worm is forced 
to halt at a chosen time, but we emphasize that precisely 
at the moment of inserting or annihilating or relinking an 
interaction (without further jump of the worm) detailed 
balance is not fulfilled. 

Because the probability distribution for two consecu- 
tive Poisson steps in the same direction is identical to 
the probability distribution for a single Poisson step, one 
finds that in this case the dynamics of the algorithm is 
completely equivalent to the dynamics without consider- 
ing a special point r. Therefore one can state in general 
that the probability for the worm head to pass a point at 
a distance r from the worm tail is given by the weight of 
the corresponing configuration in the extended partition 
function Z e . 

This observation allows for an efficient and unbiased 
evaluation of the equal-time and unequal time Green 
function Gjj(r) — Z e ^j(r) / Z: each time the worm head 
at site i passes the worm tail on a different site j, one 
has a measurement for the equal time Green function 
Gij(0), i.e., the one-body density matrix. Counting the 
times that the worm passes at a distance r, one obtains a 
measurement of the unequal time Green function Gij (r) . 



G. Summary of the LOWA algorithm 

We recapitulate and write down the full algorithm for 
the soft-core Bose-Hubbard model. 

1. Pick an arbitrary site and an arbitrary time and 
call it (io,To). Find the occupation on all sites at 
that particular time tq. Calculate the correspond- 
ing diagonal energy. A direction (left or right) is 
chosen with equal probability. Assume propagation 
to the right was chosen. 

2. At (io>T~o) a worm-pair (tail-head) is inserted. If 
the occupation is higher than zero, the occupation 
between the worm ends can either be increased or 
decreased. If the occupation at (io, tq) is zero, then 
with probability 50% a worm is inserted with in- 
creased occupation between the worm ends. With 
probability 50% no worm is inserted. 

3. When moving to the right (left), we denote by e the 
shifted energy to the left (right) of the worm head, 
eq. (|32[) . Draw an exponential deviate, p = — ln(u) 
with u an uniform random number, < u < 1. 
Evaluate the imaginary time shift window At = p/e 
and the new worm time r' = r + At. 

4. If the worm head encounters the worm tail (at the 
same site) during its propagation, the update ends 
with probability one and we arrive at a new diago- 
nal configuration. 

5. If the new worm time is larger than the time to the 
next interaction, the new worm time only equals 
the time of the next interaction. The worm can 
cither bounce back, pass, annihilate or relink the 
interaction, according to the locally optimal tran- 
sition matrix. 

6. If no interaction is encountered in the imaginary 
time shift window, the worm shifts to its new time 
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where an interaction is inserted or a bounce occurs, 
according to the locally optimal transition matrix. 

7. Go back to step 3. 

Every local single step respects the invariant distribu- 
tion in the Green function sector. When the mobile worm 
reaches the stationary worm, one can measure diagonal 
observables such as the energy, winding number, density, 
etc. They can be updated in the same way as in the 
directed loop @ and worm Q algorithms. 

The algorithm described above is valid when the di- 
agonal energies involve a single site, but also when the 
diagonal energies contain nearest-neighbor repulsion re- 
pulsion terms, and even longer range interactions. 

The worm algorithm in path-integral representation 
is in essence the same algorithm as the LOWA algo- 
rithm. The only difference is the way ergodicity and 
convergence to the correct invariant distribution are im- 
plemented with respect to the direction of worm prop- 
agation. In the worm algorithm, one chooses at every 
step between forward and backward propagation in time 
with equal probability, while in the LOWA the direction 
is maintained until an interaction forces the worm to al- 
ter its direction of propagation. In the context of the 
canonical worm algorithm, even other choices have been 
implemented [HI]. All these algorithms are the same in 
spirit, they are slightly different implementations of the 
same idea of performing local updates in the Green func- 
tion sector. The directed loop algorithm has previously 
been formulated in the path integral formulation Q . Al- 
though there are some resemblances, the LOWA is dif- 
ferent, more general, and the principles that lie at heart 
of the derivation of the algorithm are different. 

IV. IS THE LOCALLY OPTIMAL WORM 
ALGORITHM EFFICIENT? 

Although we have argued why we believe the proposed 
LOWA is efficient, we can only verify by doing numerics 
in order to get a definite answer on its efficiency. The 
results are compared with data obtained by the directed 
loop algorithm in the stochastic series expansion repre- 
sentation of Refs. [t| [l(| HH , which will be abbreviated as 
DLSSE. We used the method of Ref. [ij] for the actual 
computation of the DLSSE. The error and autocorrela- 
tion estimation was done using a binning analysis. 

A direct comparison is complicated because present 
implementations use different data structures for both 
algorithms. In the DLSSE, a doubly linked list is con- 
structed before the loop update. Since the graph is fixed, 
the number of elements in this list cannot change. This 
allows to allocate memory statically. In the worm al- 
gorithm on the other hand, the number of interactions 
can change at any time. In our Fortran code this was 
implemented using two arrays of a predetermined fixed 
length, corresponding to the interactions before and after 
the current mobile worm time. 



A. Bose-Hubbard model 

We have calculated the standard deviations on the 
kinetic energy and on the squared density for a one- 
dimensional Bose-Hubbard model of size L = 32 sites at 
an inverse temperature of (3 = 32 and with a fixed chem- 
ical potential fj, = 2 in the absence of nearest-neighbor 
repulsion, V nn = 0. We work in units t = 1. Simula- 
tions consisted of 40 bins that each ran 300 seconds on 
a Pentium III processor. We imposed a particle number 
cutoff of ten particles per site for U = 1,2 and U = 3, 
while a cutoff of five particles per site was taken for the 
other values of U, ranging from U = 4 to U = 10. Im- 
posing a cut-off is a necessity for the DLSSE, but not for 
the LOWA. The number of loops per update was opti- 
mized along the guidelines of Ref. [101 • The Mott phase 
is reached for U — 6. 

We have calculated the standard deviation on the 
kinetic energy and on the density squared (i) for the 
DLSSE. (ii) for the LOWA where the diagonal energy pa- 
rameters were chosen according to approach (a) (iii) for 
the LOWA where the diagonal energy parameters were 
chosen according to approach (c) . In (ii) and (iii) the 3x3 
transition matrices were taken as the locally optimal ones 
as in eq. ( |2"3"|) . We shall discuss this in section IIV CI In 
Figs. [6] and [7] the results for algorithms (i) and (iii) are 
shown. 

Among the different LOWA optimization parameters, 
approach (iii) is almost always the most efficient one. 
The DLSSE seems to be the preferred model for very low 
values of U . However, approach (ii) performs lots bet- 
ter than approach (iii) in this regime, and its efficiency 
is comparable to that of the DLSSE. When the diagonal 
matrix elements are much larger than the off-diagonal 
ones, as in the Mott phase U > 6, the present algorithm 
is superior. Admittedly, we recognize that it is not un- 
ambiguous how the directed loop simulations should be 
performed in the Mott phase because of the very short 
loop sizes. 



B. Spin systems 

For spin systems, the magnitude of the diagonal and 
the off-diagonnal matrix elements is of the same order. 
One can thus expect that the DLSSE is more efficient 
for spin models than for soft-core bosonic models, since 
diagonal and off-diagonal operators are treated on equal 
footing in the stochastic series expansion representation. 
It is thus interesting to compare the efficiency of the lo- 
cally optimal worm with the efficiency of the DLSSE for 
a spin- 1/2 chain. The LOWA was most efficient when the 
energy offset parameter was set to E h = 0.5. In Fig. [5] 
the standard deviations of the total energy and of the ki- 
netic energy are shown, which are obtained by applying 
the LOWA and the DLSSE to a spin-1/2 chain subject 
to a magnetic field H . We find that the locally optimal 
worm is superior to the DLSSE in our implementations, 



10 



0.01 



5 



0.001 



0.0001 




9 10 



0.00018 r 

0.00016 - 
0.00014 - 




FIG. 6: Standard deviation (statistical error) on the kinetic 
energy for the directed loop algorithm in the stochastic se- 
ries expansion framework (DLSSE), and the locally optimal 
worm (LOWA) using an energy offset added to the energy dif- 
ference for the diagonal energy parameters and using locally 
optimal updates (eq.f I23p). Simulations consisted of 40 bins 
that each ran 300 seconds on a Pentium III processor for a 
one- dimensional Bose-Hubbard model of L — 32 sites at an 
inverse temperature of j3 = 32 and with a fixed chemical po- 
tential n = 2. The accuracy of the data points is about ten 
percent. 



FIG. 8: Standard deviations on the total energy (a(E)) and 
on the kinetic energy (a(Ekin)) obtained after a simulation of 
a spin- 1/2 chain with J = A = 1 consisting of L = 64 sites 
and with inverse temperature /3 = 16. Simulations have been 
performed with the directed loop algorithm in the stochastic 
series expansion framework (DLSSE) and with the locally op- 
timal worm algorithm (LOWA) . Computations consisted of 40 
bins that each ran for 60 seconds on a Pentium-Ill processor. 



Heat-bath updates and the locally optimal 
matrix 
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FIG. 7: Idem as in Fig.|6j but now for the standard deviation 
on the average square density, (n 2 ) = rii) 2 )jL. 



but it is more meaningful to say that both algorithms 
behave similarly as the magnetic field is increased, while 
the algorithms are most efficienct near H = 0. Analogous 
conclusions were found for a spin-1 chain. For a spin- 2 
chain however, the DLSSE was found to be superior. 



A comparison between the heat-bath and locally opti- 
mal approach is made in Fig. [9] for a one-dimensional 
Bose-Hubbard model with parameters U = 1,/i = 
0.5, (3 = 32, L = 32, V nn = 0.4 and varying tunneling am- 
plitude t. The ratio of the standard deviation obtained 
by the locally optimal approach to the standard devia- 
tion obtained by the heat-bath approach is shown for the 
condensate fraction and the total energy. We see that 
the locally optimal approach is on average ten to fifteen 
percent better, but the effect is less pronounced than in 
the DLSSE [3, [lO, Oil LL9| • Even smaller differences were 
found for some other parameter regimes. 



D. Scaling of the worm with system size 

Both the worm and the DLSSE are O(N) methods. 
In the absence of correlations between subsequent mea- 
surements, the needed computation time for a desired 
accuracy scales linearly with system size and inverse tem- 
perature. The scaling efficiency is further determined by 
the dynamical exponent z, which describes how the inte- 
grated autocorrelation time scales with system size and 
inverse temperature. The worm and directed loop algo- 
rithm have very low dynamical exponents; z is even zero 
in some high-dimensional cases. This beneficient scaling 
is the cornerstone for the study of very large system sizes 
at very low temperatures. Since the present algorithm 
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FIG. 9: Simulation of a one-dimensional Bose-Hubbard model 
with parameters U = 1,(1 = 0.5, (3 — 32, L — 32, Km = 0.4 
and varying tunneling amplitude t. Plotted is the ratio r 
for the condensate fraction (r(c/)) and for the total energy 
(r(E)), being the ratio of the standard deviation olo obtained 
by the locally optimal approach to the standard deviation ant 
obtained by the heat-bath approach (r = Simulations 
consisted of 40 bins of 300 seconds on a Pentium III processor 
per data point. 



FIG. 10: Standard deviation on the average kinetic energy 
per site as a function of the system size L multiplied by the 
inverse temperature /3 for an isotropic spin-1/2 chain in zero 
magnetic field. Plots are shown when the system size and 
the inverse temperature are increased simultaneously (L = 
f3), when the system size is held constant at L — 128 and 
only the temperature f3 is varied (L = 129), and when the 
temperature is held constant at (3 — 16 and the system size 
L varies (/3 = 16). 



is based on the same principles as the directed loop al- 
gorithm and the worm algorithm, one expects that the 
dynamical exponent is similar (at least of the same or- 
der) but the prefactor of the scaling behavior might be 
different. 

We studied the scaling behavior for the critical system 
of an isotropic spin-1/2 Heisenberg chain (A = J = 1) in 
zero magnetic field (H = 0), for which the worm updates 
are fast. We investigated the effects of increasing system 
size L at fixed inverse temperature [3 on the one hand and 
of increasing the inverse temperature (3 at fixed system 
size L on the other hand. This allows us to see whether 
the algorithm scales symmetrically with system size and 
temperature or not. All calculations ran for a fixed time 
of 40 bins of 300 seconds each on a Pentium III processor. 
We used optimal parameters for the simulation: we use 
the locally optimal transition matrix of cq. (|23[) and we 
set E D ff = 0.5. We focused on the standard deviation and 
the integrated autocorrelation time of the kinetic energy, 
since the modes of this observable couple to the slowest 
mode of the simulation while the measurements can be 
calculated at low computational cost. In Fig.[TOJwe study 
the standard deviation (statistical error) on the average 
kinetic energy per site. When the system size and the 
inverse temperature are sufficiently large, we see a grad- 
ual increase in the statistical error with an exponent of 
a ~ (LP) - 5 , irrespective of whether L, (3 or both (note 
that the quantum imaginary time direction scales as one 
classical space direction) are increased. Since, for larger 
lattices, the worm will visit each site less often, this result 



is intuitively understandable. From the data in Fig. [TO] 
we can already see that the dynamic exponent z obeys 
< z < 1. Strangely, when (3 is taken too low (see the 
data points at L(3 = 16 in Fig. [TO]), the present algorithm 
loses its efficiency. Below we will relate this to higher in- 
tegrated autocorrelation times. A similar picture results 




Lx |3 

FIG. 11: Idem as in Fig. 1101 but now the loop size (Sioop) is 
shown. The loop size S^op is defined as the total number of 
interactions passed, inserted, annihilated and modified by the 
mobile worm in a single update. 

when we look at the loop size in Fig. [TT] We define the 
loop size Sioop as the total number of interactions passed, 
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FIG. 12: Idem as in Fig. 1101 but now the integrated autocor- 
relation time of the kinetic energy, r(Eki n ) is shown. 



inserted, annihilated and modified by the mobile worm 
in a single update. We see that the loop size 5ioop scales 
as Sioop ~ {LP) 1 . The loop size increases thus linearly 
with the increase in area in space-time. However, if we 
are very close to the ground state at a fixed system size , 
then increasing (3 does not result in longer loops and the 
algorithm loses its scaling properties. 

The integrated autocorrelation time r(£'fej n ) of the ki- 
netic energy in Fig. 1121 shows a similar pattern: for large 
enough space-time areas, the integrated autocorrelation 
time scales as T{Eki n ) ~ (L/3)°' 42±2 , which was fitted to 
the curve L = j3 in Fig. [12] When the inverse tempera- 
ture is too low, we unexpectedly find high autocorrelation 
times which explains the high standard deviations for the 
same data points in Fig. [TU] This behavior could be due 
to the fact that the mobile worm is always forced to make 
relatively large jumps in the time direction. 

With the DLSSE, the integrated autocorrelation time 
scales as T(Ekin) ~ (L/3) 38±2 . This result is in agree- 
ment with Ref. |{|. The dynamic exponent of the DLSSE 
is thus lower, yet in our calculations the standard devia- 



tions of the DLSSE increased more rapidly with increas- 
ing system size. This is due to the increase in compu- 
tational cost for a single update, which scales worse for 
the DLSSE. The computational cost of a single update 
depends strongly on the way of implementation, and the 
scaling of the standard deviations with system size should 
be interpreted accordingly. In addition, when looking at 
the magnetization on every site, the worm algorithm per- 
forms much better than the DLSSE for all system sizes. 
We conclude that efficiency largely depends on the im- 
plementation of the algorithm and on the observables of 
interest [lj|. 

V. CONCLUSION 



In conclusion, we have presented a new formulation of 
the worm algorithm. The present algorithm has been 
derived using the concept of locally optimal Monte 
Carlo [1 61 ] and incorporates ideas both from the worm 
algorithm Q and the directed loop algorithm Q in the 
stochastic series expansion representation We have 
compared the efficiency of the present algorithm with 
that of the directed loop algorithm for spin chains and for 
the Bose-Hubbard chain. Especially when there are large 
diagonal matrix elements, the present worm algorithm is 
very successful. We have shown that choosing the locally 
optimal matrix for the transition matrices occurring in 
the stochastic subprocesses yields an efficient algorithm. 
We found that the loop size increases linearly with the in- 
crease in area in space-time, and that the dynamic expo- 
nent equals z = 0.84 ±4 for an isotropic Heisenberg chain 
without magnetic field. Seen the efficiency of the method 
and its advantageous scaling properties, the algorithm is 
suitable for large scale calculations of spin systems and 
soft-core bosonic models. 
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